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2.0  Summary  of  Technical  Results 

Oiu*  goals  this  year  involved  learning  in  connectionist  networks  while 
automatically  decomposing  b^uviors  in  aider  to  support  du)se  behaviors  with 
modular  architectures.  Wh^  there  has  been  some  woric  in  dtis  area,  (e.g.  Jacobs, 
Jordan  &  Barto  1991)  we  desired  to  have  the  modules  fully  evolve  in  response  to 
the  demands  of  the  task.  Tb  accomplish  this,  we  needed  a  training  mechanism 
more  robust  than  badqnopagation,  so  we  turned  towards  genetic  algorithm 
(GAs).  These  algorithms,  based  on  principles  adc^ted  from  natural  selection, 
allow  solutions  to  be  evolved  which  fit  the  requirements  of  an  environment. 

There  is  an  extensive  body  of  work  applyittg  GAs  to  evolving  neural  networks, 
but  most  simply  use  GAs  to  set  the  weights  for  a  fixed-structure  network.  Those 
that  attempt  to  evolve  network  structure  do  so  in  a  very  limited  way.  (See 
Schaffer,  et  al.,  1992  for  a  good  overview.)  Thus  before  applying  GAs  to  network 
modularization,  we  first  had  to  solve  the  "generalized  network  acquisition" 
problem,  i.e.,  the  problem  of  acquiring  bofii  network  structure  and  wei^t  values 
simultaneously.  The  result  was  GNARL,  an  algorithm  for  GeNeralized  Acquisition 
of  Recurrent  Links  (Angeline,  Saunders,  and  Pollack,  1993).  In  contrast  to 
constructive  and  destructive  algorithms  for  network  induction  (eg.,  Fahlman  and 
Lebiere,  1990;  Fahlman  1991;  Chen,  et  aL,  1993;  Mozer  &  Sn  olensky,  1989;  Omlin 
&  Giles,  1993),  GNARL  is  an  "\insupervised"  form  of  ztetwork  learning  using  a 
population  of  mutating  networks  where  the  severity  oi  mutation  is  modulated  by 
feedback  from  an  fitness  function  -  as  a  network  approaches  the  goal,  its  chances 
of  reproducing  without  modification  decrease  towards  0. 

The  power  of  GNARL  has  been  demonstrated  on  several  tasks,  including  the 
Tomita  language  acquisition  task  (Pollack,  1991).  Here  we  will  highlight  the 
Tracker  task,  described  by  Jefferson,  et  al.  (1991).  In  this  problem,  a  simple  agent  (a 
simulated  ant)  is  placed  on  a  two-dimensional  toroidal  grid  that  contains  a  trail  of 
food.  The  ant  traverses  the  grid,  eating  in  one  time  step  any  food  it  contacts.  The 
goal  of  the  task  is  to  maximize  the  number  of  pieces  of  food  the  ant  eats  within  a 
predefined  allotted  time.  The  trail  of  food  used  in  the  experiment  (shown  in  figure 
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Figure  1:  The  ant  problem,  (a)  The  trail  is  connected  initialfy,  Imt  bectmes  progressively  more 
difficult  to  follow.  The  underlying  2-dgrid  is  toroidal,  so  that  position  "P"  is  the  first  break  in 
die  trail,  (b)  The  semantics  of  the  HO  units  for  die  ant  network.  The  first  input  node  denotes  the 
presence  of  food  in  the  square  directly  in  front  of  the  ant;  the  second  denotes  the  absence  of  food 
in  this  same  square.  No-op.  fivm  J^erson,  aUows  the  network  to  stay  Ui  one  position  while 
activation  flows  through  recurrent  links.  The  network  in  (b)"eat^  42  pieces  of  food  before 
ginning  endlessly  in  place  at  position  P.  The  simple  PSA  in  (c)  is  a  handcrcffied  solution  which 
moves  ^  there  is  food,  otherwise  turns  and  looks  in  4  directions  b^ore  continuing  to  move. 


Following  Jefferson,  et  al  (1991),  die  ant  is  controlled  by  a  networic  with  two  input 
nodes  and  four  output  nodes,  as  shown  in  figure  lb.  The  first  input  node  denotes 
the  presence  of  food  in  the  square  direcdy  in  front  of  die  ant;  the  second  denotes 
die  absence  of  food  in  diis  same  square,  restricting  die  possible  legal  inputs  to  the 
network  to  (1, 0)  or  (0, 1).  Eadi  of  the  four  output  units  corresponds  to  a  unique 
executable  action  -  move  forward,  turn  left,  turn  right,  or  no-op.  Each  ant  is  in  an 
implicit  sense/act  loop  that  repeatedly  sets  the  input  activations  of  the  network, 
ccnnputes  the  activations  of  the  output  nodes,  determines  the  output  node  with 
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maximum  activation  and  executes  its  associated  action.  Every  application  of  the 
sense/act  kx^  is  to  happen  in  a  single  time  step.  Once  a  position  with 

food  is  viated,  the  food  is  removed.  The  fitness  function  used  in  fiiis  task  is 
simply  the  number  food  units  omsumed  In  200  time  steps. 

In  tiiese  experiments,  we  used  a  population  of  100  netwcxrks.  In  tiie  first  run  (2090 
generations  using  104,600  network  evaluations),  GNAKL  found  a  network  that 
cleared  81  grid  positions  within  the  200  time  steps.  When  we  allowed  this  ant  to 
run  for  an  additional  119  time  steps,  it  successfully  cleared  the  entire  trail 


Analysis  of  the  behavior  of  the  network  under  conditions  of  constant  food  inputs 
and  constant  no-food  inputs  revealed  that  it  had  the  same  behavior  as  a  simple  5- 
state  automata  designed  by  Jefferson(figure  Ic).  In  an  empty  torus,  it  would  sim¬ 
ply  move  across  a  line  forever.  GNARL  was  not  constrained  to  find  FSA's  how¬ 
ever.  On  another  run  it  evolved  a  network  tiiat  cleared  82  grid  points  within  tire 
200  time  steps.  Rgure  3  shows  the  limit  behavior  of  this  network  under  different 
input  conditions.  Each  graph  shows  a  collection  of  the  output  uiuts  as  a  point  in 
the  3D  space  (move^ighUeft),  since  no-op  is  never  used.  Under  the  condition  of 
:x)nstant  food  (a),  it  moves.  In  the  case  of  a  no-food  condition,  instead  of  spinning 
in  one  place,  or  rotating  4  times  and  moving,  this  network  makes  a  complicated 
pseudo-random  series  of  turns  and  moves,  whidi  shows  up  as  tiie  "D"  shape  in 
tile  graph,  and  enables  it  to  explore  more  of  the  empty  torus.  Hgure  3(c)  shows 
the  output  of  the  network  on  the  given  trail,  and  figure  3(d)  plots  the  movement 
of  the  agent  over  time  on  an  empty  grid. 


2.1  Cooperating  Agents 

A  natural  next  question  to  ask  is  whether  such  simple  evolved  agents  can  behave 
in  a  cooperative  manner.  The  issue  of  evolved  cooperation  has  been  the  subject  of 
intense  research  since  Axdiod  (1984)  showed  tiiat  two  agents  engaged  in  a 
prisoner's  dilemma  (Poundstone,  1992)  can  evolve  into  mutually  cooperative 
states.  Recent  studies  explore  alternatives  to  Axelrod's  lit-for-Tat  strategy 
(Nowak  and  Sigmund,  1993);  other  research  extends  the  two-way  interactions  by 
introducing  multiple  agents.  Such  work  is  still  far  from  complete  some  agents 
that  are  completely  designed  (eg.,  Drogoul  and  Fdrber,  1993);  some  have 
homogenous  structure  and  perform  only  parametric  learning  (e.g.,  Assad  and 
Packard,  1992);  and  finally  some  focus  on  simple  tasks  (eg.,  Deneubouig,  et  al., 
1991). 

Our  most  recent  woik  is  tiie  study  of  how  to  evolve  cooperation  among 
heterogenous  agents.  Our  thesis  is  that  a  set  of  such  iTidividuals  -  associated  only 
loosdy  by  being  part  of  the  same  group,  and  communicating  only  indirectly  via  the 
environment  -  should  be  able  to  learn  to  cooperate  to  solve  a  given  task. 
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(C)  (d) 

Figure  2:  Limit  behavior  cf  a  network  for  the  antproMem.  Graphs  show  the  state  of  the 
output  units  Move,  Right,  Left,  (a)  Fixed  p<^  attractor  dust  results  for  sequence  of  500 
"foodT  signals;  (b)  Limit  cycle  attractor  that  results  when  a  sequence  of 500  "nofoodT 
signals  is  given  to  network;  (c)  AU  states  visited  while  traversing  the  trail;  (d)  The  x 
position  of  the  ant  aver  time  when  run  on  an  empty  grid. 


CXir  agents  are  amtndled  by  recurrent  neural  netwcnks,  and  ev<dved  using 
GNARL.  Because  GNARL  places  no  restrictions  on  the  number  of  hidden  nodes 
of  a  network  or  on  the  network's  connectivity,  it  is  an  ideal  system  in  which  to 
build  cooperative  agents  widi  heterogenous  structure. 

Motivated  by  biology,  many  researchers  have  esqMuided  upon  the  anHiomain  to 
explore  cooperative  probleoMolving  in  insecMike  agents  (eg.,  Colcmni,  et  al., 
19^  Drogoul,  et  aL,  1992;  Deneubouiig  et  aL,  1991;  Iheraulaz,  et  al.,  1991).  We 
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have  chosen  an  alternate  domain,  one  which  we  feel  should  serve  as  a  rich 
substrate  for  our  theories  of  cooperative  evolution:  socco. 

A  competitive  team  sp<»rt  like  soccer  is  a  complex  task.  Cooperation  is  much  more 
direct  than  in  insect  "food-ooUection"  tasks  which  can  often  be  seen  as  simple 
questions  of  parallelism:  players  must  coordinate  defensively  to  protect  the  goal 
by  covering  different  opposition  team  members,  and  must  coordiiute  offensively 
to  score.  Passing  -  the  most  critical  aspect  of  the  game  -  carmot  be  accomplished 
unless  both  the  kicker  and  the  recipient  work  closely  together.  Additionally,  there 
is  the  added  complication  of  competition  between  opposing  teams. 

The  dtial  elements  of  cooperation  and  competition  make  credit  assignment  in 
soccer  particularly  difficult.  G)i\sider  evaluating  a  given  agent  Even  if  it  is  the 
best  player  on  the  field,  its  team  may  lose.  Likewise,  if  it  is  the  worst  player  on  the 
field,  its  team  may  win.  Moreover,  die  game  may  result  in  a  ti&  How  is  die  fitness 
of  each  team  (player)  evaluated  in  this  case?  Ihking  motivation  fiom  the 
principles  of  soccer,  we  are  currendy  exploring  two  key  fiictors:  ball  possession 
(the  amount  of  time  a  team  controls  the  ball)  and  support  (the  amount  of  aid  a 
team  gives  an  agent  who  controls  the  ball). 

At  this  stage  in  our  work,  we  have  developed  a  set  of  agent  sensors  and  actuators 
that  is  sufficient  for  cooperation  in  soccer,  shown  in  Tables  1  and  Z 

As  a  proof  of  concept,  we  have  both  designed  by  hand,  and  evolved  by  network, 
several  soccer  players  using  diese  primitives.  To  demonstrate,  a  set  of  snapshots 
of  a  game  is  shown  in  Figure  3.  Play  begins  with  the  Circle  team,  defending  the 
lower  goal,  and  the  Square  team,  defending  the  upper  (Figure  3,  top-left). 
Members  of  both  teams  execute  move-to-ball  (top-right),  but  because  players  can 
are  heterogenous,  plays'  4  of  the  Square  team  is  fastest  and  readies  die  ball  first 
(bottom-left).  Sensors  left-open,  forward-open,  and  right-open  return  negative 
results,  so  player  4  passes  bade  to  its  teammate,  player  7  (bottom-right).  Beyond 
the  scenes  depicted  in  the  figure,  player  7  passes  back  to  player  6,  who  then 
passes  up  to  player  4,  who  dribbles  forward  but  eventually  loses  control  of  the 
ball,  demonstrating  that  our  chosen  primitives  support  both  cooperation  and 
competition 

There  are  several  reasons  for  die  relatively  high  level  of  modeling  we  have 
chosen.  Hrst,  to  explore  cooperation  in  soccer,  something  like  our  sensors/ 
behaviors  must  exist.  Had  we  chosen  a  lower-level  set  of  primitives,  concepts 
such  as  "goal-open"  would  still  have  had  to  emerge.  The  situation  is  analogous  to 
the  instructions  for  coaches  in  introductory  soccer  books:  they  begin  with  "how  to 
kick  the  ball,"  then  progress  to  "how  to  pass",  and  maybe  in  the  last  chapter 
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‘ndrfe  1:  Soccer  Sensors 


TkUe  2:  Soccer  Actuators 


Sensor 

Meaning 

touching-tudl 

player  is  touching 
ball 

goal-open 

player  has  an  open 
shot  at  goal 

left-open 

area  to  left  of  player 
iscleaf  - 

forward-open 

areamfiontof 
player  is  clear 

right-open 

!  aieatorightof 
player  is  dear 

open-teammatB-p 

player  has  a  team¬ 
mate  open 

better-teammate-p 

player  has  a  team- 
male  open  who  is 
“better^  (closer  to 
goal  or  more  open) 

wc-possess 

our  team  has  the 
baU 

Actnaaon 

Meaning 

move-to-ball 

move  one  step  in 
(fiiectkm(tfball 

shoot-to-goal 

kick  foe  ball  hard  to 
foe  goal 

dribble-fiorward 

Iddc  die  ball  gently 
forward 

dribble-left 

kick  foe  ball  gently 
to  die  left 

dribble-ri^ 

kick  the  ball  gently 
to  die  right 

pass 

pass  foe  ball  to  the 
best  open  teaimnarc 

get-open 

move  to  an  open 
ptsition 

tty-to-txap 

foe  ball  fium 
moving 

quickly  discuss  team-level  strategies  (eg.,  Hargreaves,  1990;  MctJettian,  1987).  ^ 

Since  we  are  mostly  interested  in  team-levdi  issues,  we  diose  a  high-set  of 
primitives  for  our  startup  point  Serxmdly,  Ae  startup  levd  of  abstraction  is 
largely  irrelevant  -  what  is  impcntant  Is  the  distance  between  the  starting 
primitives  and  the  ultimate  bdhavior.  Hnally,  foe  levd  we  have  chosen  is  still  too 
complex  to  completdy  solve  the  task  by  hand  We  have  already  modeled  soccer  • 

players  with  these  primitives,  and  even  designed  expert  players,  but  in 
preliminary  studies,  the  evolved  netwmks  often  outperformed  the  hand  built 
experts.  It  is  difficult  to  predict  how  heterogenous  players  will  work  together. 


Distancing  oursdves  frcmi  this  particular  game  domain,  future  work  should 
provide  deeper  insights  into  foe  artificial  evolution  of  cooperation  and 
complexity.  How  should  the  fitness  of  a  group  of  agents  acting  cooperatively  be 
evaluated?  Is  it  necessary  to  assign  fitness  to  individual  players,  or  will  group 
scoring  be  enough.  Either  less  able  players  will  luddly  be  weeded  out  by  foe 
competitive  process  of  teams,  or  they  will  hitchhike  along.  To  evolve  cooperating 


j 
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Figure  3:  Snapshots  of  passing.  Top-Uft:  start  of  gone.  Top-right:  movement  to  ball. 
Bottom-Uft:  Player  4  poised  to  pass.  Bottom-right:  player  4  hcu  passed  back  to  player  7. 
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agents,  must  each  player  always  cooperate,  or  is  there  enough  flexibility  to  allow 
for  the  selfish  behavior  found  in  groups  (e.g.  show-off  players)?  What  are  flw 
dynamics  in  such  open  games  which  create  and  *lodc  in"  evolutionarily  stable 
but  mediocre  strategies,  and  how  can  that  lock  be  broken?  We  are  now  poised  fo 
ask  these  and  other  questions;  a  team  competition  as  modelled  by  our  soccer 
simulation  is  a  fertile  ground  for  research  into  ccx>perative  and  competitive  teams 
of  heterogenous  agents. 
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3.0  Publication  Activity 

3.1  Refereed  Papers 

1.  Angeline,  P.,  Saunders,  G..  &  Pollack,  J.  (in  Press)  An  evolutionary  Algorithm  that 
constructs  recurrent  neural  networks.  IEEE  T^ans.  on  Neural  Nets 

2.  Saunders,  G.  Angeline,  P  &  Pollack,  J.  (Accepted)  Structural  and  Behavioral  Evolution 
of  recurrent  netwoiks.Neural  Inforxnatitxi  Processing  Systems  Conference, 

3.2  Presentations  on  this  work 

1.  Saunders  presented  work  at  the  Midwest  Oxinectfest  m,  held  November  1992  at 
Camegie-Mellon. 

2.  Angeline  &  Saunders  presented  this  work  in  woikshqrs  at  the  International  Conference 
on  Genetic  Algorithms,  July  1993,  at  University  of  Illinois. 

3.  Pollack  was  external  faculty  at  Santa  Fe  Instimte  in  Nov.  1992,  and  presented  some  of 
this  work. 

4.  Pollack  was  keynote  speaker  “Only  99  (hundred  million)  lines  of  code  to  go"  at  the 
Midwest  Artificial  Intelligence  and  Cognitive  Science  meeting,  April  1993  in  Indiana 
Dunes. 
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4.0  Transitions  &  Dept,  of  Defense  Interaction 
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5.0  Software  Prototypes 

We  have  built  and  released  an  efficient  set  of  software  tools  for  research 
recurrent  neural  nets.  For  ease  of  oonstructiaiv  we  implemented  the  system 
Coxxunon  lisp  for  its  flexibility.  But  in  order  to  make  the  resultant  system  more 
efficient,  we  drcunnscribed  all  numerically  intensive  subroutines  in  a  small  well- 
defined  library  of  linear  algebra  routines.  Hus  has  been  implemented  bofli  in  USP 
and  in  C,  but  the  foreign  function  interface  is  specific  to  the  foreign  function 
capacity  of  the  "Allegro"  brand  of  Common  Lisp,  and  is  not  completely  portable. 
We  have  supplemented  these  libraries  wifli  new  modes  of  learning  based  on 
genetic  algorithms  and  genetic  programxxung. 

The  lisp  software  is  available  to  interested  researchers  by  FTP  ftee  of  c^.arge. 
Please  contact  ansil@cis.ohio-state.edu  for  further  information. 
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Abstract 

Standard  methods  for  inducing  both  the  strocture  uid  weight  values  of  recunent  neural 
networks  fit  an  assumed  class  of  archiiectiires  to  every  task.  This  simplification  is  neces¬ 
sary  because  the  interactions  between  network  strocture  and  fimetion  are  not  well  under¬ 
stood.  Evolutionary  compuiatioii,  which  inciudes  genetic  alforilhms  and  evolutionary 
programming,  is  a  populmion-based  sevch  method  that  his  shown  promise  in  such  com¬ 
plex  tadts.  This  papa  argues  that  genetic  algorithms  ate  inappwtpriaie  for  network  acqui¬ 
sition  and  desertttes  an  evolutiaovy  program,  called  (WARL,  dM  simultaneously 
acquires  both  the  structure  and  weightt  for  tecunem  networks.  This  algorithm’s  empirical 
acquisition  method  allows  fa  the  emergence  of  complex  behavion  and  topologies  that  are 
potentially  excluded  by  the  artificial  architBctiaal  constraints  imposed  in  standard  network 
induction  methods. 
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Abstract 

Sttodafd  methods  for  inducing  both  the  scructuie  end  weight  values  of  recuiient  neural 
networks  fit  an  assumed  class  of  archiiectuies  to  every  task.  This  simplification  is  neces¬ 
sary  because  the  interactions  between  nMwork  structure  and  function  are  not  well  under- 
sto^  Evolutionary  computation,  which  includes  genetic  algorithms  and  evolutionary 
programming,  is  a  populiaion-based  search  method  that  has  shown  promise  in  such  com¬ 
plex  tasks.  This  paper  argues  that  genetic  aigorithms  are  inappropriate  for  network  acqui¬ 
sition  and  describes  an  evolutionary  program,  called  GNARL,  that  simultaneously 
acquires  both  the  souctuie  and  wei^tt  for  lecuneot  networks.  This  algorithm’s  empirical 
acquisition  method  allows  for  the  emergence  of  comtfiexbehavion  and  topologies  that  ate 
potentially  excluded  by  the  artificial  architectural  constraints  impoeed  in  standard  network 
induction  methods. 


1.0  Introduction 

In  its  complete  form,  network  induction  entails  both  parametric  and  structural  learning  [1], 
i.e.,  learning  both  weight  values  and  an  appropriate  topology  of  nodes  and  links.  Current  methods 
to  solve  this  task  fall  into  two  broad  categories.  Constructive  algorithms  initially  assume  a  simple 
network  and  add  nodes  and  links  as  warranted  [2-8],  while  destructive  methods  start  with  a  large 
network  and  prune  off  superfluous  components  [9-12].  Though  these  algorithms  address  the  prob¬ 
lem  of  topology  acquisition,  they  do  so  in  a  highly  constrained  manner.  Because  they  monotoni- 
cally  modify  netwofk  structure,  constructive  and  destructive  methods  limit  the  traversal  of  the 
available  architeennes  in  that  once  an  architecture  has  been  explored  and  determined  to  be  insuf¬ 
ficient,  a  new  architecture  is  adopted,  and  the  old  becomes  topologically  umeachable.  Also,  these 
methods  often  use  only  a  single  predefined  structural  modification,  such  as  “add  a  fully  connected 
hidden  unit,”  to  generate  successive  topologies.  This  is  a  form  of  structural  hill  climbing,  which  is 
susceptible  to  becoming  trapped  at  structural  local  minima.  In  addition,  constructive  and  destruc¬ 
tive  algorithms  make  simplifying  architectural  assumptions  to  facilitate  network  induction.  For 
example.  Ash  [2]  allows  only  feedforward  networks;  Fahlman  [6]  assumes  a  restricted  form  of 
recurrence,  and  Chen  et  al.  [7]  explore  only  fully  connected  topologies.  This  creates  a  situation  in 
which  the  task  is  forced  into  the  architecture  rather  than  the  architecture  being  fit  to  the  task. 
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These  deficiencies  of  constructive  and  destructive  nuethods  stem  from  inadequate  noethods  for 
assigning  credit  to  structural  components  of  a  network.  As  a  result,  the  heuristics  used  are  overly- 
constrained  to  increase  the  likelilwod  finding  any  topology  to  solve  the  problem.  Ideally,  the 
constraints  for  such  a  solution  should  come  from  die  task  rather  than  be  implicit  in  the  algorithm. 

This  ptqier  presents  GNARL.  a  networit  induction  algorithm  that  simultaneously  acquires  both 
network  u^logy  and  weight  values  while  making  minimal  architectural  restrictions  and  avoiding 
structural  hill  climbing.  The  algorithm,  described  in  section  3,  is  an  instance  of  evolutionary  pro¬ 
gramming  [13,  14],  a  class  of  evolutionary  computation  that  has  been  shown  to  perform  well  at 
function  optimization.  Section  2  argues  that  this  class  of  evolutionary  computation  is  better  suited 
for  evolving  neural  networks  than  genetic  algorithms  [IS,  16],  a  mxe  popular  class  of  evolution¬ 
ary  conqjutation.  Finally,  section  4  demonstrates  GNARL's  ability  to  create  recurrent  networks 
for  a  variety  of  problems  of  interest 

2.0  Evolving  Connectionist  Networks 

Evolutionary  computation  provides  a  promising  collection  of  algorithms  for  structural  and 
parametric  learning  of  recurrent  networks  [17].  These  algorithms  are  distinguished  by  their  reli¬ 
ance  on  a  population  of  search  space  positions,  rather  than  a  single  position,  to  locate  exo-ema  of  a 
function  defined  over  the  search  space.  During  ooe  search  cycle,  or  generation,  the  members  of 
the  population  are  ranked  according  to  a  fitness  fitnction,  and  those  with  higher  fimess  are  proba¬ 
bilistically  selected  to  become  parents  in  the  next  genention.  New  population  members,  called 
offspring,  are  created  using  specialized  reproduction  heuristics.  Using  the  population,  reproduc¬ 
tion  heuristics,  and  fimess  function,  evolutionary  computation  inq>lements  a  nonmonotonic  search 
that  performs  well  in  complex  multimodal  environments.  Gasses  of  evolutionary  computation 
can  be  distinguished  by  exatruning  the  specific  reproduction  heuristics  employed. 

Genetic  algorithms  (GAs)  [IS,  16]  are  a  popular  form  of  evolutionary  computation  that  rely 
chiefiy  on  the  reproduction  heuristic  of  crossover}  This  operator  forms  offspring  by  recombining 
representational  components  from  two  membos  of  the  population  without  regard  to  content.  This 
purely  structural  approach  to  creating  novel  population  members  assumes  that  components  of  all 
parent  representations  may  be  freely  exchanged  without  inhibiting  the  search  process. 

Various  combinations  of  GAs  and  connectioiust  networks  have  been  investigated.  Much 
research  concentrates  on  the  acquisition  of  parairteters  for  a  fixed  network  architecture  (e.g.,  [18  - 
21]).  Other  work  alio  vs  a  variable  topology,  but  disassociates  structure  acquisition  from  acquisi¬ 
tion  of  weight  values  by  interweaving  a  GA  search  for  network  topology  with  a  traditional  para¬ 
metric  trairung  algorithm  (e.g.,  backptopagation)  over  weights  (e.g.,  [22,  23]).  Some  studies 
attempt  to  coevolve  both  the  u^logy  and  weight  values  within  the  GA  framework,  but  as  in  the 
connectionist  systems  described  above,  the  netwmk  architectures  are  restricted  (e.g.,  [24  -  26]).  In 
spite  of  this  collection  of  studies,  cuirent  theory  from  both  genetic  algorithms  and  connectionism 
suggests  that  GAs  are  not  well-suited  for  evolving  networks.  In  the  following  section,  the  reasons 
for  this  mismatch  are  explored. 


1.  Genetic  algorithms  also  employ  other  opeiMon  to  manipulate  the  population,  including  a  form  of  mutation,  but 
their  distinguishing  feature  is  a  heavy  reliance  on  crossover. 
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Figurt  I.  The  dual  represeiuation  scheme  used  in  genetic  algorithms.  The  interpretation  function  maps 
between  the  elements  in  recombination  space  on  which  the  search  is  performed  and  the  subset  o/  structures 
that  can  be  evaluated  as  potential  task  solutions. 


2.1  Evolving  Networks  with  Genetic  Algorithms 

Genetic  algorithms  create  new  individuals  by  recombining  the  representational  components  of 
two  member  of  the  population.  Because  of  this  commitment  to  structural  recombination,  GAs  typ¬ 
ically  rely  on  two  distinct  representational  spaces  (Figure  1).  Recombination  space,  usually 
defined  over  a  set  of  fixed-length  binary  strings,  is  the  set  of  structures  to  which  the  genetic  oper¬ 
ates  are  applied.  It  is  here  that  the  search  actually  occurs.  Evaiuation  space,  typically  involving  a 
problem-dependent  representation,  is  the  set  of  structures  whose  ability  to  perform  a  task  is  evalu¬ 
ated.  In  the  case  of  using  GAs  to  evolve  networks,  evaluation  qrace  is  comprised  of  a  set  of  net¬ 
works.  An  inurpretation  function  maps  between  these  two  reinesentational  spaces.  Any  set  of 
finite-length  bit  strings  caruwt  represent  all  possible  networks,  thus  the  evaluation  space  is 
restricted  to  a  predetermined  set  of  networks.  By  design,  the  dual  representation  scheme  allows 
the  GA  to  crossover  the  bit  strings  without  any  knowledge  of  their  interpreution  as  networks.  The 
implicit  assumption  is  that  the  interpretation  function  is  defined  so  that  the  bit  strings  created  by 
the  dynamics  at  the  GA  will  m^>  to  successively  better  networks. 

The  dual  representation  of  GAs  is  an  inqwrtant  feature  for  searching  in  certain  environments. 
For  instance,  when  it  is  unclear  how  to  search  the  evaluation  space  directly,  and  when  there  exists 
an  interpretation  function  such  that  searching  the  space  of  bit  strings  by  crossover  leads  to  good 
points  in  evaluation  space,  then  the  dual  representation  is  ideaL  It  is  unclear,  however,  that  there 
exists  an  interpretation  function  that  makes  dual  tefnesentation  beneficial  fix’  evolving  neural  net¬ 
works.  Qearly,  the  choice  of  interpretation  function  introduces  a  strong  bias  into  the  search,  typi¬ 
cally  by  excluding  many  potentially  interesting  and  useful  networks  (another  example  of  forcing 
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Flgun  2.  Tht  competuig  coimnUoiu  pnbUm  [29] .  Bit  aringsAaiidB  map  to  structurally  and  computationally 
equivalent  networks  that  assign  the  hidden  uiuts  in  different  orders.  Because  the  bit  strings  are  distinct,  crossover 
is  likely  to  produce  an  cffspring.that  contains  multiple  copies  of  the  same  hidden  node,  yielding  a  network  with 
less  computational  ability  than  either  parent. 


the  task  into  an  architecture).  Moreover,  the  benefits  of  having  a  dual  representation  hinge  on 
crossover  being  an  appropriate  evolutionary  q)etatar  for  the  task  for  some  particular  interpreta¬ 
tion  function;  otherwise,  the  need  to  translate  between  dual  rqiresentations  is  an  unnecessary 
complication. 

Characterizing  tasks  for  which  crossover  is  a  beneficial  operator  is  an  open  question.  Current 
theory  suggests  that  crossover  will  tend  to  recombine  short,  connected  substrings  of  the  bit  string 
representation  that  correspond  to  above-average  task  solutions  when  evaluated  [16,  IS].  These 
substrings  are  called  biding  blocks,  making  explicit  the  intuition  that  larger  structures  with  high 
fitness  are  built  out  of  smaller  structures  with  mutate  fitness.  Crossover  tends  to  be  most  effec¬ 
tive  in  environments  where  the  fitness  of  a  member  of  the  population  is  reasonably  correlated  with 
the  expected  ability  of  its  representational  components  [27].  Environments  where  this  is  not  true 
are  called  deceptive  [28]. 

There  are  three  forms  of  deception  when  using  crossover  to  evolve  connectionist  networks. 
The  first  involves  networks  that  share  both  a  comnxm  topology  and  common  weights.  Because 
the  interpretation  function  may  be  many-to-one,  two  such  networks  need  not  have  the  same  bit 
string  representation  (see  Figure  2).  Crossover  will  then  tend  to  create  offspring  that  contain 
repeated  components,  and  lose  the  computational  ability  of  some  of  the  parents*  hidden  units.  The 
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resulting  networks  will  tend  to  perform  woae  than  their  parents  because  they  do  not  possess  key 
computational  components  for  the  task.  Schaffer  et  al.  [29]  term  this  die  competing  conventions 
problem,  and  point  out  that  the  number  of  competing  conventions  grows  exponentially  with  the 
number  of  hidden  units. 

The  second  fmm  of  deception  involves  two  networks  with  identical  topologies  but  different 
weights.  It  is  well  known  that  for  a  given  task,  a  single  connectionist  topology  affords  multiple 
solutions  fw  a  task,  each  implemented  by  a  unique  distrusted  representation  spread  across  the 
hidden  units  [30,  31].  While  the  removal  of  a  small  number  of  nodes  has  been  shown  to  effect 
only  minor  alterations  in  the  perftvmance  of  a  trained  network  [30,  31],  the  computational  role 
each  node  plays  in  the  overall  representation  of  the  task  solution  is  determined  purely  by  the  pres¬ 
ence  and  strengths  of  its  interconnections.  Furthermore,  there  need  be  no  correlation  between  dis¬ 
tinct  distributed  lepiesenutions  over  a  particular  netwc^  architecture  for  a  given  task.  This 
seriously  reduces  the  chance  that  an  arbitrary  crossover  operation  between  distinct  distributed 
representations  will  construct  viable  offspring  regardless  of  the  interpretation  function  used. 

Finally,  deception  can  occur  when  the  parents  differ  topologically.  The  types  of  distributed 
representations  that  can  develop  in  a  network  vary  widely  with  the  number  of  hidden  units  and  the 
network’s  connectivity.  Thus,  the  distributed  representations  of  U^logically  distinct  networks 
have  a  greater  chance  of  being  incompatible  parents.  This  further  reduces  the  likelihood  that 
crossover  will  produce  good  offspring. 

In  short,  for  crossover  to  be  a  viable  operatcv  when  evolving  netwraks,  the  interpretation  func¬ 
tion  must  somehow  compensate  for  all  the  types  of  deceptiveness  described  above.  This  suggests 
that  the  complexity  of  an  appropriate  interpretation  function  will  more  than  rival  the  complexity 
of  the  original  learning  problem.  Thus,  the  prospect  of  evolving  connectionist  networiu  with 
crossover  appears  limited  in  general,  and  better  te»ilts  should  be  expected  with  reproduction  heu¬ 
ristics  that  respect  the  uniqueness  of  the  distributed  representations.  This  point  has  been  tacitly 
validated  in  the  genetic  algtvithm  literature  by  a  trend  towards  a  reduced  reliance  on  binary  repre¬ 
sentations  when  evolving  networks  (e.g.  [32, 33]).  Orossover,  however,  is  still  commonplace. 


2.2  Networks  and  Evolutionary  Programming 

Unlike  genetic  algorithms,  evolutionary  programming  (EP)  [14,34]  defines  representation- 
dependent  mutation  operaton  that  create  offspring  within  a  qiecific  locus  of  the  parent  (see  Figure 
3).  EP’s  commitment  to  mutation  as  the  sole  reproductive  operated’  for  searching  over  a  space  is 
preferable  when  there  is  no  sufficient  calculus  to  guide  recombination  by  crossover,  or  when  sep¬ 
arating  the  search  and  evaluation  spaces  does  not  afford  an  advantage. 

Relatively  few  previous  EP  systems  have  addressed  the  problem  of  evolving  connectionist 
networks.  Fogel  et  al.  [33]  investigate  training  feedforward  networks  on  some  classic  connection¬ 
ist  problems.  McDonnell  and  Waagen  [36]  use  EP  m  evolve  the  coimectivity  of  feedforward  net¬ 
works  with  a  constant  number  of  hidden  units  by  evolving  both  a  weight  matrix  and  a 
connectivity  matrix.  Fogel  [14],  [37]  uses  EP  to  induce  three-layer  fully-connected  feedforward 
networks  with  a  variable  number  of  hidden  units  that  employ  good  strategies  for  playing  Tic-Tac- 
Toe. 
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Pigun  3.  Tht  evobitioHory  pmgrammiHg  approach  to  modatiag  tvobaion.  Unlike  genetic  algorithms, 
evobitionaj  programs  perform  search  in  the  space  of  networks.  Offsprit^  created  by  mutation  remain  within  a 
locus  of  simUarity  to  their  parents. 

In  each  of  the  above  studies,  the  mutation  qjerator  alters  the  parameters  of  netwcH-k  r\  by  the 
function: 


w  =  VW6  T|  (EQl) 

where  w  is  a  weight,  €(ti)  is  the  error  of  the  network  on  the  task  (typically  the  mean  squared 
error),  a  is  a  user-defined  proportionality  constant,  and  o^)  is  a  gaussian  variable  with  mean 
4  and  variance  o^.  The  inqriementations  of  structural  mutations  in  these  studies  differ  somewhat. 
McDonnell  and  Waagen  randomly  select  a  set  tff  weights  and  alters  their  values  with  a  prob¬ 
ability  based  on  the  variance  of  the  incident  nodes’  activation  over  the  training  set;  connections 
from  nodes  with  a  high  variance  having  less  of  a  chance  of  being  altered.  The  structural  mutation 
used  in  [14, 37]  adds  or  deletes  a  single  hidden  unit  with  equal  probability 

Evolutionary  programming  provides  distinct  advantages  over  genetic  algorithms  when  evolv¬ 
ing  networks.  First,  EP  manipulates  networks  directly,  thus  obviating  the  need  for  a  dual  represen¬ 
tation  and  the  associated  interpretation  function.  Second,  by  avoiding  crossover  between 
networks  in  creating  offiqmng,  the  individuality  of  each  network’s  distributed  representation  is 
reflected.  For  these  reasons,  evolutionary  programming  provides  a  more  iq)propriate  firanrework 
for  simultaneous  structural  and  parametric  learning  in  recurrent  networks.  The  GNARL  algo¬ 
rithm,  presented  in  the  next  section  and  investigated  in  the  remainder  of  this  paper,  describes  one 
such  approach. 


I 


i 


1 

1 

I 


3.0  The  GNARL  Algorithm 

GNARL,  which  stands  for  GeNertMzed  Acquisition  of  Recurrent  Unks,  is  an  evolutionary 
algorithm  that  nonnoonotonically  constructs  recurrent  networks  to  solve  a  given  task.  The  name 
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'”in  + 
input 

nodes 


^out 

output 

nodes 


at  most 
hidden  nodes 

Ffgurt  4.  SampU  initial  network.  The  number  of  input  nodes  (m^J  and  number  of  maput  nodes  (m^}  is  fixed 
for  a  ffven  task.  The  presence  a  bias  node  (b»Oorl)as  well  as  the  maximum  number  of  hidden  uni  ts  { 
is  set  by  the  user.  The  initial  connectivity  is  chosen  randomly  (see  text).  The  disconnected  hidden  node  does  not 
effect  this  particular  network's  computation,  but  is  available  as  a  resource  for  structural  mutations. 

GNARL  reflects  the  types  of  networks  that  arise  firom  a  generalized  network  induction  algorithm 
performing  both  structural  and  parametric  learning.  Instead  of  having  uniform  or  symmetric 
topologies,  the  resulting  networits  have  “gnarled”  interconnections  of  hidden  units  which  more 
accurately  reflect  constraints  iriheient  in  the  ta-dc 

The  general  architecture  of  a  GNARL  netwtvk  is  strai^tforward.  The  input  and  output  nodes 
are  considered  to  be  provided  by  the  task  and  are  immutable  by  the  algorithm;  thus  each  network 
for  a  given  task  always  has  input  nodes  and  ou^ut  nodes.  The  number  of  hidden  nodes 
varies  from  0  to  a  user-supplied  maximum  h,^.  Bias  is  qpdonal;  if  provided  in  an  experiment,  it 
is  implemented  as  an  additional  input  node  with  constant  value  one.  All  non-input  nodes  employ 
the  standard  sigmoid  activation  function.  Links  use  real-valued  weights,  and  must  obey  three 
restrictions: 

Rj:  There  can  be  no  links  to  an  input  node. 

R2:  There  can  be  no  links  from  an  output  node. 

Rf.  Given  two  nodes  x  and  y,  there  is  at  most  one  link  from  x  to  y. 

Thus  GNARL  networks  may  have  no  connections,  sparse  connections,  or  full  connectivity.  Con¬ 
sequently,  GNARL’s  search  space  is: 

5  a  (q:  T]  is  a  network  with  real-valued  weights, 
q  satisfies  R/'Rj, 

q  has  *  b  input  nodes,  where  b*!  if  a  bias  node  is  provided,  and  0  otherwise, 
q  has  output  nodes, 
q  has  1  hidden  nodes,  0  ^  i  ^  ^hnax) 

Rj-Ry  are  strictly  implementational  constraints.  Nothing  in  the  algtvithm  described  below  hinges 
on  5  being  pruned  by  these  restrictions. 

3.1  Selection,  Reproduction  and  Mutation  of  Networks 

GNARL  initializes  the  population  with  randomly  generated  networks  (see  Figure  4).  The 
number  of  hidden  nodes  for  each  network  is  chosen  from  a  uniform  distribution  over  a  user-sup- 
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plied  range.  The  number  of  initial  links  is  chosen  similarly  from  a  second  user-supplied  range. 
The  incident  nodes  for  each  link  are  chosen  in  accordance  with  the  structural  mutations  described 
below.  Once  a  topology  has  been  chosen,  all  links  are  assigned  random  weights,  selected  uni¬ 
formly  from  the  range  [-1, 1].  There  is  nothing  in  this  initialization  procedure  that  forces  a  node  to 
have  any  incident  links,  let  alone  for  a  path  to  exist  between  the  input  and  output  nodes.  In  the 
experiments  below,  the  number  of  hidden  units  for  a  network  in  the  initial  population  was  selected 
uniformly  between  one  and  five  and  the  number  of  initial  links  varied  utuformly  between  one  and 
10. 


In  each  generation  of  search,  the  networks  are  first  evaluated  by  a  user-supplied  fimess  func- 
nonf.  S  R,  where  R  represents  the  reals.  Netwtxks  scoring  in  the  top  50%  are  designated  as 
the  parents  of  the  next  generation;  all  other  netwmks  are  discarded.  This  selection  method  is  used 
in  many  EP  algorithms  although  competitive  noethods  of  selection  have  also  been  investigated 
[141. 

Generating  an  offspring  involves  three  steps:  copying  the  parent,  determining  the  severity  of 
the  mutations  to  be  performed,  and  finally  mutating  the  copy.  Network  mutations  are  separated 
into  two  classes,  corresponding  with  the  types  of  learning  discussed  in  [1].  Parametric  mutations 
alter  the  value  of  parameters  (link  weights)  currently  in  the  network,  whereas  structural  mutations 
alter  the  numbo*  of  hidden  nodes  and  the  presence  of  links  in  the  network,  thus  altering  the  space 
of  parameters. 

3.1.1  Severity  of  Mutations 

The  severity  of  a  mutation  to  a  given  parent,  t),  is  dictated  by  that  network’s  temperature, 

nn): 


r(n)  =  i-y-^  (EQ2) 

where  f nag  is  the  maximum  fitness  for  a  given  task.  Thus,  the  temperature  of  a  network  is  deter¬ 
mined  by  how  close  the  network  is  to  being  a  solutitxi  for  the  task.  This  measure  of  the  network’s 
performance  is  used  to  anneal  the  structural  and  parametric  similarity  between  parent  and  off¬ 
spring,  so  that  networks  with  a  high  temperature  axe  mutated  severely,  and  those  with  a  low  tem¬ 
perature  are  mutated  only  slightly  (cf.  [38]).  This  allows  a  coarse-grained  search  initially,  and  a 
progressively  finer-grained  search  as  a  network  approaches  a  solution  to  the  task,  a  process 
described  more  concretely  below. 

3.1,2  Parametric  Mutation  of  Networks 

Parametric  mutations  are  accomplished  by  perturbing  each  weight  w  of  a  network  t|  with 
gaussian  noise,  a  method  motivated  by  [37, 14].  In  that  body  of  work,  weights  are  modified  as  fol¬ 
lows: 


w  -  w  -*■  N  (0,  arCti))  Vw  e  T| 


(EQ3) 
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where  a  is  a  user-defined  proportionality  constant,  and  o^)  is  a  gaussian  random  variable  as 
before.  While  large  parametric  mutations  are  occasionally  necessary  to  avoid  parametric  local 
minima  during  search,  it  is  mcHe  likely  they  will  adversely  affect  the  offspring’s  ability  to  perform 
better  than  its  parent  To  compensate,  GNARL  updates  weights  using  a  variant  of  equation  3. 
Hrst  the  instantaneous  temperature  f  of  the  network  is  computed: 

fin)  =  U  (0,1)  T(i\)  (EQ4) 

where  C/(0, 1)  is  a  uniform  random  variable  over  the  interval  [0, 1].  This  new  temperature,  vary¬ 
ing  from  0  to  r(Ti),  is  then  substituted  into  equation  3: 

w  =  w A/ (0,  Oif'fil))  Vw6  T|  (EQ5) 

In  essence,  this  modification  lessens  the  frequency  of  large  parametric  mutations  without  disal¬ 
lowing  them  completely.  In  the  experiments  described  below,  a  is  one. 

3.1J  Structural  Mutation  of  Networks 

The  structural  mutations  used  by  GNARL  alter  the  number  of  hidden  nodes  and  the  connec¬ 
tivity  between  all  nodes,  subject  to  restrictions  Rj-Rj  discussed  earlier.  To  avoid  radical  jumps  in 
fitness  from  parent  to  offspring,  structural  mutations  attempt  to  preserve  the  behavior  of  a  net¬ 
work.  For  instance,  new  links  are  initialized  with  zero  wei^t,  leaving  the  behavior  of  the  modi¬ 
fied  network  unchanged.  Similarly,  hidden  units  are  added  to  the  network  without  any  incident 
connections.  Links  must  be  added  by  future  structural  mutations  to  determine  how  to  iitcorporate 
the  new  computational  unit  Unfortunately,  achieving  this  behavioral  continuity  between  parent 
and  child  is  not  so  simple  when  removing  a  hidden  node  or  link.  Consequently,  the  deletion  of  a 
node  involves  the  complete  removal  of  the  node  and  all  incident  links  with  no  further  modifica¬ 
tion  to  compensate  for  the  behavioral  change.  Similarly,  deleting  a  link  removes  that  parameter 
from  the  network. 

The  selection  of  which  node  to  remove  is  unifcmn  over  the  collection  of  hidden  nodes.  Addi¬ 
tion  or  deletion  of  a  link  is  slightly  more  complicated  in  that  a  parameter  identifies  the  likelihood 
that  the  link  will  originate  from  an  input  node  or  terminate  at  an  output  node.  Once  the  class  of 
incident  node  is  determined,  an  actual  node  is  chosen  utuformly  from  the  class.  Biasing  the  link 
selection  process  in  this  way  is  necessary  when  there  is  a  large  r^erential  between  the  number  of 
hidden  nodes  and  the  numb^  of  input  or  output  nodes.  This  parameter  was  set  to  0.2  in  the  exper¬ 
iments  described  in  the  next  section. 

Research  in  [14]  and  [37]  uses  the  heuristic  of  adding  or  deleting  at  most  a  single  fully  con¬ 
nected  node  per  structural  mutation.  Therefore,  it  is  possible  for  this  method  is  to  become  trapped 
at  a  structural  local  minima,  although  this  is  less  probable  than  in  nonevolutionary  algorithms 
given  that  several  topologies  may  be  present  in  the  population.  In  order  to  more  effectively  search 
the  range  of  netwo^  architectures,  GNARL  uses  a  severity  of  mutation  for  each  separate  struc¬ 
tural  mutation.  A  unique  user-defined  interval  specifying  a  range  of  modification  is  associated 
with  each  of  the  four  structural  mutations.  Given  an  interval  of  [A^jn,  A,iuul  ^  particular  struc¬ 
tural  mutation,  the  number  of  modifications  of  this  type  made  to  an  offspring  is  given  by: 
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(EQ6) 


Thus  the  number  of  modifications  varies  uniformly  over  a  ahriniring  interval  based  on  the  parent 
network’s  fitness.  In  the  experiments  below,  the  maximum  number  of  nodes  added  or  delet^  was 
three  while  the  maximum  number  of  links  added  or  deleted  was  five.  The  minimum  number  f(»' 
each  interval  was  always  one. 

3.2  Fitness  of  a  Network 

In  evolving  networks  to  perform  a  task,  GNARL  does  not  require  an  explicit  target  vector  - 
all  that  is  needed  is  the  feedback  given  by  the  fitness  function  /.  But  if  such  a  vector  is  present,  as 
in  supervised  learning,  there  are  many  ways  of  transforming  it  into  a  measure  of  fimess.  For 
example,  given  a  training  set  ((x/,  yj),  (X2.  y2).  —  }>  three  possible  measures  of  fimess  for  a  net¬ 
work  q  are  sum  of  square  errors  (equation  7),  sum  absolute  errors  (equation  8),  and  sum  of 
exponential  absolute  errors  (equation  9): 


'^(yi-Out(r\,Xi))^ 

1 

(EQ7) 

-  Our  (n.x,)| 

i 

(EQ8) 

(EQ9) 

Furthermore,  because  GNARL  explores  the  space  of  networks  by  mutation  and  selection,  the 
choice  of  fimess  function  does  not  alter  the  mechanics  of  the  algmithm.  To  show  GNARL’s  flexi¬ 
bility,  each  of  these  fimess  functions  will  be  dem<Mrstrated  in  the  experiments  below. 


4.0  Experiments 

In  this  section,  GNARL  is  applied  to  several  problems  of  interest  The  goal  in  this  section  is  to 
demonstrate  the  abilities  of  the  algorithm  on  problems  from  language  induction  to  search  and  col¬ 
lection.  The  various  parameter  values  for  the  program  are  set  as  described  above  unless  otherwise 
noted. 

4.1  Williams*  Trigger  Problem 

As  an  initial  test  GNARL  induced  a  solution  for  the  enable-trigger  task  proposed  in  [39]. 
Consider  the  finite  state  generator  shown  in  Figure  S.  At  each  time  step  the  system  receives  two 
input  bits,  (a,  b),  representing  “enable”  and  “trigger”  signals,  respectively.  This  system  begins  in 
state  $1,  and  switches  to  state  $2  only  when  enabled  by  a«l.  The  system  remains  in  $2  until  it  is 
triggered  by  b^l,  at  which  point  it  outputs  1  and  resets  the  state  to  S^.  So,  for  instance,  on  an  input 
stream  {(0, 0),  (0, 1),  (1, 1),  (0, 1)},  the  system  will  output  (0, 0, 0, 1 )  and  end  in  Sf.  This  simple 
problem  allows  an  indefinite  amount  of  time  to  pass  between  the  enable  and  the  trigger  inputs; 
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a=l  ->  output  0 


FigunS.  Ah  FSA  Uuu  defines  the  tnable-oigger  task  [391.  The  system  is  given  a  data  stream  cf  bit  pairs 
((at.  bj).  (02.  b2),  ..J.  tutd  produces  anotupiaef(lfseeidrs.Tocaptiin  this  system's  inpuUoutput  behavior,  a 
eonnectionist  network  must  leant  to  store  state  indefinUety. 

thus  no  finite  length  sample  of  the  output  stream  will  indicate  the  current  state  of  the  system.  This 
forces  GNARL  to  develop  networks  that  can  preserve  state  information  indefinitely. 

The  fimess  function  used  in  this  experiment  was  the  «im  of  exponential  absolute  errors  (equa¬ 
tion  9).  Population  size  was  50  networks  with  the  number  of  hidden  units  restricted  to 

six.  A  bias  node  was  provided  in  each  network  in  this  iiutial  experiment,  ensuring  that  an  activa¬ 
tion  value  of  1  was  always  available.  Note  that  this  does  not  imply  that  each  node  had  a  nonzero 
bias;  links  to  the  bias  node  had  to  be  acquired  by  structural  mutation. 

Traiiting  began  with  all  two  input  strings  of  length  two,  shown  in  Tablel.  After  118  genera¬ 
tions  (3(XX)  network  evaluations^),  GNARL  evolved  a  network  which  solved  this  task  for  the 
strings  in  Table  1  within  tolerance  of  0.3  on  the  output  units.  The  trairung  set  was  then  increased 
to  include  all  64  input  strings  of  length  three  and  evolution  of  the  networks  was  allowed  to  con¬ 
tinue.  After  an  additional  422  generations,  GNARL  once  again  found  a  suitable  network.  At  this 
point,  the  difficulty  of  the  task  was  increased  a  final  time  by  training  on  all  256  strings  of  length 
four.  After  another  225  generations  (~200(X)  network  evaluations  total)  GNARL  once  again  found 
a  network  to  solve  this  task,  shown  in  Figure  6b.  Note  that  there  are  two  completely  isolated 
nodes.  Given  the  fitness  function  used  in  this  experiment,  the  two  isolated  nodes  do  not  effect  the 
network’s  viability.  To  investigate  the  generalization  of  this  network,  it  was  tested  over  all  4096 
unique  strings  of  length  six.  The  outputs  were  rounded  off  to  the  nearest  integer,  testing  only  the 
network’s  separation  of  the  strings.  The  network  performed  correctly  on  99.5%  of  this  novel  set, 
generating  incmiect  responses  for  only  20  strings. 

Figure  7  shows  the  connectivity  of  the  population  member  with  the  best  fitness  for  each  gener¬ 
ation  over  the  course  of  the  run.  Initially,  the  best  network  is  sparsely-connected  and  remains 
sparsely-connected  throughout  most  of  the  run.  At  about  graeration  400,  the  size  and  connectivity 


2.  Number  of  networics  evaluated  ■lpopulatk»l-»'generMioiis*lpopuladofil  *50%  of  the  population  removed  each 
generation,  giving  SO  >  118  *  SO  *  0.5  «  3000  networic  evaluations  for  this  iriaL 
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Flgiini.  CoMtelivkycftmtrtaaTeMMtitH)riafomdi$i  At  best  network  of 

ieeerotioo  I.  (b)  The  best  network  of  generation  765.  This  network  sotm  the  task  for  all  strings  of  length  eight. 


increases  dramatically  only  to  be  overtaken  by  titf  relativety  qiarse  architecture  shown  in  Figure 
6b  on  the  final  generation.  Apparently,  this  more  ^larsely  connected  network  evolved  mote 
quickly  than  the  mote  full  architectures  that  were  b^  in  eariier  generations.  The  oscillations 
between  different  network  architectures  throughout  the  tun  reflects  the  development  of  such  com¬ 
peting  architectures  in  the  population. 


4.2  Inducing  Regular  Languages 

A  current  topic  of  research  in  the  coruiectionist  community  is  the  induction  of  finite  state 
autonuua  (E^  As)  by  netwcnts  with  second-order  tecutrem  connections.  For  instance.  Pollack  [40] 
trains  sequential  cascaded  networks  (SCNs)  over  a  test  set  of  languages,  provided  in  [41]  and 


Input 

Target 

Ouqwt 

{(0.0).  (0.0)) 

{0.0) 

{(0.0).  (0.1)) 

{0.0) 

{(0.0).  (1.0)) 

{0.0) 

{(ao).(i.i)) 

(0.0) 

{(o.i).(ao)) 

(0.0) 

{(0.1).  (0.1)) 

(0.0) 

{(ai).(i.o)) 

(0.0) 

{(0.1).  (l.D) 

(0.0) 

Input 

Target 

Output 

{(1.0).  (0.0)) 

{0.0) 

{(1.0).(0.  D) 

(0.1) 

{(1.0).  (l.O) 

(0.0) 

{(1.0).  (1.  D) 

(0.1) 

{(1. 1).(0.0)) 

(0.0) 

{(1. 1).(0.1)) 

(0.1) 

{(1. 1).(1.0)) 

(0.0) 

{(1. 1),(1.  D) 

{0.1) 

Table  I.  Initial  training  data  fi>r  enabU-trigger  task. 
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Generation  number 

Figun  7.  D^ertiu  network  topologies  explored  by  GNARL  tkiriiig  the  first 540  generations  on  the  enable-trigger 
problem.  T he  presence  of  a  link  between  node  i  and  j  at  generation  g  is  indieated  by  tt  dot  at  position  (g.  10*  i*j) 
in  the  graph.  Note  that  because  node  3  is  the  output  node,  there  are  no  connections  from  it  throughout  the  run.  The 
arrow  designates  the  point  of  transition  between  the  first  two  gaining  sets. 

shown  in  Table  2,  using  a  variation  of  backpropagation.  An  interesting  result  of  this  work  is  that 
the  number  of  states  used  by  the  network  to  implement  finite  state  behavior  is  potentially  infinite. 
Other  studies  using  the  training  sets  in  [41]  have  investigated  various  network  architectures  and 
training  methods,  as  well  as  algorithms  for  extracting  FSAs  firom  the  trained  architectures  [42  - 
45]. 

An  explicit  collection  of  positive  and  negative  examples,  shown  in  Tible  3,  that  pose  specific 
difficulties  for  inducing  the  intended  languages  is  offered  in  [41].  Notice  that  the  training  sets  are 
unbalaiured,  incomplete  and  vary  widely  in  their  ability  to  strictly  define  die  intended  regular  lan¬ 
guage.  GNARL’s  ability  to  leam  and  generalize  firom  diese  training  sets  was  compared  against  the 
training  results  reptvted  for  the  second-order  architecture  used  in  [42].  Notice  that  all  the  lan¬ 
guages  in  Table  2  require  recurrent  network  connectiotts  in  order  to  induce  the  language  com¬ 
pletely.  The  type  of  recurrence  needed  for  each  language  varies  widely.  For  instance,  languages  1 
throe  gh  4  require  an  incorrect  input  be  remembered  indefinitely,  forcing  the  network  to  develop 
an  ana:  '«g  version  of  a  trap  state.  Networks  for  language  6,  however,  must  parse  and  count  indi- 
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Language 

Deeertftlan 

r 

2 

(!(»• 

3 

no  odd  length  0  ttingi  anytime 
aAer  «  odd  kngth  1  string 

4 

no  moK  than  two  Oi  in  a  row 

5 

an  even  som  of  IQi  and  01s,  pairwise 

6 

(number  of  Is  -  mimber  of  Os)  mod  3  «  0 

Tabk2.  Regular  languages  to  be  iiiduetd. 

vidual  inputs,  potentially  changing  state  from  accept  to  reject  or  vice  versa  on  each  successive 
input 

The  results  obtained  in  [42]  are  summarized  in  Ihble  4.  The  table  shows  the  number  of  net¬ 
works  evaluated  to  learn  the  training  set  and  the  accuracy  of  generalization  for  the  learned  net¬ 
work  to  the  intended  regular  language.  Accuracy  is  measured  as  the  percentage  of  strings  of 


Language 


1 


PosUve  Instances 

Negative  Instances 

e.  1. 11.  in.  nil.  11111. 111111, 1111111, 
11111111 

0. 10.01.00.011. 110.000. 11111110. 

10111111 

e.  10. 1010. 101010. 10101010. 
10101010101010 

1.0. 11. 00. 01. 101. 100. 1001010. 10110. 
110101010 

e.  1.0.01. 11.00. 100.  no.  111.000. 100100. 
noooooniooooi.  nnonoooioonioo 

10. 101.010. 1010.  no.  1011, 10001.  iiioio. 
1001000. 11111000.0111001101. 

11011100110 

e.  1.0. 10.01.00. 100100.001111110100. 

olooiooloa  11100.010 

000, 11000. 0001, 000000000. 00000. 0000. 
11111000011. 1101010000010111. 

1010010001 

e.  11.00.001.0101. 1010. 1000111101. 
1001100001111010.111111.0000 

1.0, 111.010.000000000. 1000.01. 10. 
1110010100.010111111110.0001.01! 

e.  10.01. 1100, 101010. 111.000000. 
0111101111, 100100100 

1.0, 11.00. 101.011. 11001. 1111.00000000, 
010111, 10111101111. 1001001001 

e,  1.0. 10.01. 11111.000.00110011.0101. 
0000100001111.00100.011111011111.00 

lOlOi  00110011000.0101010101. 1011010. 
10101.010100. 101001. 100100110101 

Tables.  Tratning  sets  for  the  languages  cf  Table  2  fiam  1411. 
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Language 

Average 

evaluatious 

aeemrmey 

Feweu 

evakuatons 

Best% 

aeeuney 

1 

3033.8 

28 

100.0 

2 

4522.6 

91.18 

807 

100.0 

3 

12326.8 

64.87 

442 

78.31 

4 

4393.2 

42.50 

60 

60.92 

5 

1587.2 

44.94 

368 

66.83 

6 

2137.6 

23.19 

306 

46.21 

7 

2969.0 

36.97 

373 

55.74 

Tabtt  4.  Speed  and  generalixation  nsuiu  reporud  by  [421  for  learning  the  data  sets  cf  Table  3. 

length  10  or  less  that  are  correctly  classified  by  the  network.  For  comparison,  the  table  lists  both 
the  average  and  best  performance  of  the  five  runs  reported  in  [42]. 

This  experiment  used  a  population  of  50  netwtxks,  each  limited  to  at  most  eight  hidden  units. 
Each  run  lasted  at  most  1000  generations,  allowing  a  maximum  of  25050  networks  to  be  evalu¬ 
ated  for  a  single  data  set  IVo  experiments  were  run  for  each  data  set  one  using  the  sum  of  abso¬ 
lute  errors  (SAE)  and  the  other  using  sum  of  square  errors  (SSE).  The  error  for  a  particular  string 
was  computed  only  for  the  final  output  of  the  network  alter  the  entire  string  plus  three  trailing 
“null”  symbols  had  been  entered,  one  input  per  time  step.  The  concatenation  of  the  trailing  null 
symbols  was  used  to  identify  the  end  of  the  string  and  allow  input  of  the  null  string,  a  method  also 
used  in  [42].  Each  network  had  a  single  input  and  mitput  and  no  bias  node  was  provided.  The 
three  possible  logical  inputs  for  this  task,  0, 1,  and  null,  were  represented  by  activations  of  -1,  1, 
and  0,  respectively.  The  tolerance  fOT  the  output  value  was  0.1,  as  in  [42]. 

Table  5  shows  for  both  fitness  functions  the  numbCT  of  evaluations  until  convergence  and  the 
accuracy  of  the  best  evolved  network.  Only  four  of  the  runs,  each  of  those  denoted  by  a  ‘+’  in  the 
table,  failed  to  produce  a  network  with  the  q;)ecified  tolerance  in  the  allotted  1000  generations.  In 
the  runs  using  SAE,  the  two  runs  that  did  not  converge  had  not  separated  a  few  elements  of  the 
associated  training  set  and  qrpeared  to  be  far  from  discovering  a  network  that  could  correctly 
classify  the  complete  training  set  Both  of  the  uncompleted  runs  using  SSE  successfully  separated 
the  data  sets  but  had  not  done  so  to  the  0.1  tolerance  within  the  1000  generation  limit  Figure  8 
compares  the  number  of  evaluations  by  GNARL  to  the  average  number  of  evaluations  reported  in 
[42].  As  the  graph  shows,  GNARL  consistently  evaluates  more  networks,  but  not  a  disproportion¬ 
ate  number.  Considering  that  the  space  of  networics  being  searched  by  GNARL  is  much  larger 
than  the  space  being  searched  by  [42],  these  numbers  rqrpear  to  be  within  a  tolerable  increase. 

The  graph  of  Figure  9compares  the  accuracy  of  the  GNARL  networks  to  the  average  accuracy 
found  in  [42]  over  five  runs.  The  GNARL  networks  consistently  exceeded  the  average  accuracy 
found  in  [42]. 
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Evaluations 


25000 


20000 

15000 


KfiX 


SAE  fitness 
SSE  fitness 
Result  from  [42] 


Training  Set 

Flgmn  8.  The  number  of  network  evaluations  required  to  learn  the  stven  data  sets  of  Table  3.  CNARL  (using 
both  SAE  and  SSE  fitness  measures)  compared  to  the  average  manber  of  evaluations  for  the  five  runs  described 
in  142}. 


These  results  demonstrate  GNARL’s  ibility  to  simulomeously  acquire  the  topology  and 
weights  of  recurrent  networks,  and  that  this  can  be  done  within  a  comparable  number  of  network 
evaluations  as  training  a  network  with  static  architecture  on  the  same  task.  GNARL  also  appears 
to  generalize  bener  consistently,  possibly  due  to  its  selective  inclusion  and  exclusion  of  some 
links. 


Language 

Evaiuatlons 

(SAE) 

%  Accuracy 
(SAE) 

Bvabtations 

(SSE) 

%  Accuracy 
(SSE) 

1 

3973 

100.00 

3300 

"""^9^7""** 

2 

96.34 

13973 

73.33 

3 

2505(r 

38.87 

18630 

68.00 

4 

13773 

92J7* 

21830 

37.13 

5 

2303(r 

49.39 

22323 

31.23 

6 

21473 

33  J9* 

2305(r 

44.11 

7 

12200 

71.37* 

23030* 

31.46 

Table  5.  Speed  and  generalisation  resulufor  GNAEL  to  train  recurrent  networks  to  recognise  the  data  sets  of 
Tabus. 
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Language 

Figmn  9.  Percmtc^  aecuracy  of  evoivtd  Miworks  om  Umguoies  im  TabU  2.  QNARL  (using  SAE  and  SSE 

jitntss  mecuures)  compand  tamerago  accuracy  of  tkefm  runs  in  142] .  • 


4.3  The  Ant  Problem 

GNARL  was  tested  on  a  complex  seareh  and  collection  task  -  the  Tracker  task  described  in 
[46],  and  further  investigated  in  [47].  In  this  problem,  a  simulated  ant  is  placed  on  a  two-dimen-  * 

sional  toroidal  grid  that  contains  a  trail  of  food.  The  ant  traverses  the  grid,  collecting  any  food  it 
contacts  along  the  way.  The  goal  of  the  task  is  to  discover  an  ant  which  collects  the  maximum 
number  of  pieces  of  food  in  a  given  time  period.  (Hgure  10). 

Following  [46],  each  ant  is  controUed  by  a  network  with  two  input  nodes  and  four  output 
nodes  (Figure  1 1).  The  first  input  node  denotes  the  presence  of  food  in  the  square  directly  in  front  * 

of  the  ant;  the  second  denotes  the  absence  of  food  in  diis  same  square,  restricting  the  possible 
legal  inputs  to  the  netwwk  to  (1, 0)  or  (0, 1).  Each  of  the  four  output  units  corresponds  to  a  unique 
action:  move  forward  one  step,  turn  left  90",  turn  right  90^,  or  no-op.  At  each  step,  the  action 
whose  cmreqxMiding  output  node  has  maximum  activatkm  is  perform^.  As  in  the  original  study 
[46],  no-op  allows  the  ant  to  remain  at  a  fixed  positioii  while  activation  flows  along  recurrent  con¬ 
nections.  Fitness  is  defined  as  the  number  of  g^  positions  cleared  within  200  time  steps.  The  task 
is  difficult  because  simple  networks  can  perform  surprisingly  well;  the  network  shown  in  Figure 
1 1  collects  42  pieces  of  food  before  spinning  endlessly  at  position  A  (in  Figure  10),  illustrating  a 
very  high  local  maximum  in  the  search  qmce.  ^ 

The  experiment  used  a  population  of  100  networks,  each  limited  to  at  nrast  nine  hidden  units, 
and  did  not  provide  a  bias  node.  In  the  first  run  (2090  generations),  GNARL  found  a  network 
(Figure  12b)  that  clears  81  grid  positions  within  die  2(X>  time  steps.  When  this  ant  is  run  for  an 
additional  119  time  steps,  it  successfully  clean  die  entire  trail.  To  understand  how  the  network 
traverses  the  path  of  food,  consider  the  simple  FSA  shown  in  Figure  13,  hand-crafted  in  [46]  as  an  • 

approximate  solution  to  the  problem.  This  simple  machine  receives  a  score  of  81  in  the  allotted 
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Figure  10.  The  aru  problem.  The  trail  is  connected  initially,  but  becomes  progressively  more  difficuU  to  follow. 
The  underlying  2-d  grid  is  toroidal,  so  that  position  “A“  is  the  first  break  in  the  trail  -it  is  simple  to  reach  this 
point.  Positions  "8“  and  “C*  indicau  the  only  two  positions  along  the  trail  where  the  ant  discovered  in  run  I 
behaves  d^rendy  from  the  5-stau  FSA  of  [46]  (see  Figure  13). 

200  time  steps,  and  clears  the  entire  trail  only  five  time  steps  faster  than  the  network  in  Figure 
12b.  A  step  by  step  comparison  indicates  there  is  only  a  slight  difference  between  the  two. 
GNARL’s  evolved  networic  follows  the  general  strategy  embodied  by  this  FSA  at  all  but  two 
places,  marked  as  positions  B  and  C  in  Figure  10.  Here  the  evolved  network  makes  a  few  addi¬ 
tional  moves,  accounting  for  the  slightly  longer  completion  time. 


Move  Left  Right  No-op 


Food  No  food 


Figure  ll.  The  semantics  of  the  HO  unia  for  the  ant  networt  The  first  input  node  denotes  the  presence  of  food 
in  the  square  direedy  in  front  of  the  ant:  the  second  denotes  the  absence  of  food  in  this  same  square.  This 
portico^  network  finds  42  pieces  of  food  b^ore  spinning  endlessly  in  place  at  position  P.  illustrating  a  very 
deep  local  minimum  in  the  search  space. 
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Figurt  12.  The  Tracker  Task,  first  run.  (a)  The  best  network  in  the  initial  population.  Nodes  0  and  1  are  input, 
nodes  5S  are  output,  and  no^  2^4  art  hidden  nodes,  (b)  Network  itubtc^  by  GNARL  c^er  2090  generations. 
Forward  links  are  dashed;  bidirectional  links  and  loops  art  solid.  The  light  gray  connection  between  nodes  8 
and  13  is  the  sole  backlink.  This  network  clears  the  trail  in  319  epochs. 

Figure  14  illustrates  the  strategy  the  network  uses  to  implement  the  FSA  by  showing  the  state 
of  the  output  units  of  the  network  over  three  different  sets.  Each  point  is  a  triple  of  the  form 
{move,  right,  left)}  Figure  14a  shows  the  result  of  supplying  to  the  network  200  “food”  inputs  -  a 
fixed  point  that  executes  “Move.”  Hgure  14b  shows  the  sequence  of  states  reached  when  200  “no 
food”  signals  are  supplied  to  the  network  -  a  collection  of  points  describing  a  limit  cycle  of  length 
five  that  repeatedly  executes  the  sequence  “Right,  Right,  Right,  Right,  Move.”  These  two  attrac¬ 
tors  determine  the  response  of  the  network  to  the  task  (Figure  14c,  d);  the  additional  points  in  Fig- 


3.  No-op  is  not  shown  because  it  was  never  used  in  the  final  networic. 
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Focxl/Movc 


NoFood/Right 


Food/Move 


NoFood/Move 


NoFood/Right 


NoFood/Right 


Flgtin  13.  FSA  tumd^rcfiedfor  the  Thuker  task  in  [46J.  The  large  arrow  indicates  the  initial  state.  This 
simple  system  implements  the  strategy  "move  forward  /  Aere  is  food  in  front  of  you,  otherwise  turn  right  four 
times,  looking  for  food.  If  food  is  found  while  turning,  pursue  it,  odierwise,  move  forward  one  step  and 
repeat."  This  FSA  traverses  the  entire  trail  in  314  sups,  and  gets  a  score  of  81  in  the  altotud  200  time  steps. 


ure  14c  are  transients  encountered  as  the  network  alternates  between  these  attractors.  The 
difTerences  in  the  number  of  steps  required  to  clear  the  tndl  between  the  FSA  of  Figure  13  and 
GNARL’s  network  arise  due  to  the  state  of  the  hidden  units  when  transferring  from  the  “food” 
attractor  to  the  “no  food”  attractor. 

However,  not  all  evolved  network  behaviors  are  so  simple  as  to  approximate  an  FSA  [40].  In  a 
second  run  (1595  generations)  GNARL  induced  a  network  that  clear^  82  grid  points  within  the 
200  time  steps.  Figure  15  demonstrates  the  behavior  of  diis  network.  Once  again,  the  “food” 
attractor,  shown  in  Figure  15a,  is  a  single  point  in  die  ^Mce  that  always  executes  “Move."  The 
“no  food”  behavior,  however,  is  not  an  FSA;  instead,  it  is  a  quasiperiodic  trajectory  of  points 
shaped  like  a  “D”  in  output  space  (Fgure  15b).  The  placement  the  “D”  is  in  the  “Move  /  Ught” 
comer  of  the  space  and  encodes  a  complex  alternation  between  these  two  operations  (see  Figure 
15d). 

In  contrast,  research  in  [46]  uses  a  genetic  algoridim  on  a  population  of  65,536  bit  strings  with 
a  direct  encoding  to  evolve  only  the  weights  of  a  neural  network  with  five  hidden  units  to  solve 
this  task.  The  particular  network  architecture  in  [46]  uses  Boolean  threshold  logic  for  the  hidden 
units  and  an  identity  activation  function  for  the  ouqiut  units.  The  first  GNARL  network  was  dis¬ 
covered  after  evaluating  a  total  of  104,600  netwoiks  while  the  second  was  found  after  evaluating 
79,850.  The  experiment  reported  in  [46]  discovered  a  conqiarable  network  after  about  17  genera¬ 
tions.  Given  [46]  used  a  population  size  of  65,536  and  replaced  95%  of  the  population  each  gener¬ 
ation,  the  total  number  of  network  evaluations  to  acquire  the  equivalent  networic  was  1,123,942. 
This  is  10.74  and  14.07  times  the  number  of  networks  evaluated  by  GNARL  in  the  two  runs.  In 
spite  of  the  differences  between  the  two  studies,  this  rignificant  reduction  in  the  number  of  evalu¬ 
ations  provides  empirical  evidence  that  crossover  may  not  be  best  suited  to  the  evolution  of  net¬ 
works. 
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Flgun  14.  Limit  behavior  of  the  network  that  clears  the  trail  in  319  steps.  Graphs  show  the  state  of  the  output 
units  Move,  Fight,  L^.  (a)  Fixed  point  attractor  that  residts  for  sequence  of 500  "footT  signals:  (b)  Limit  cycle 
attractor  that  results  when  a  sequence  of 500  "no  footT  signals  is  given  to  network;  (c)  All  states  visited  while 
traversing  the  trail:  (d)  The  path  oftheantonan  empty  puL  The  Z  axis  represents  time.  Note  that  x  is  fixed,  and 
y  increases  monotonically  at  a  fixed  rate.  The  large  jumps  in  y  position  are  artifacts  of  the  toroidal  grid. 

5.0  Conclusions 

Allowing  the  task  to  ^)ecify  ah  appropriate  architecture  for  its  solution  should,  in  principle,  be 
the  defining  aspect  of  the  complete  network  induction  problem.  By  restricting  the  space  of  net¬ 
works  explored,  constructive,  destructive,  and  genetic  algorithms  only  partially  address  the  prob¬ 
lem  of  topology  acquisition.  GNARL’s  architectural  constraints  Rj-^y  similarly  reduce  the  search 
space,  but  to  a  far  less  degree.  Furthermore,  none  of  these  constraints  is  necessary,  and  their 
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Figun  15.  Limit  behavior  of  the  network  of  the  second  run.  Graphs  show  the  state  of  the  output  units  Move. 
Right.  Left,  (a)  Fixed  point  attractor  that  results  fitr  sequence  of 3500  "food"  signals:  (b)  Limit  cycle  attractor 
that  results  when  a  setptenee  of  3500  "no  food"  signals  is  given  to  network;  (c)  All  states  visited  while 
traversing  the  trail;  id)  The  path  of  the  ant  on  an  empty  grid.  The  x  axis  represents  time.  The  ant's  path  is 
comprised  of  a  set  of  "railroad  tracks,"  Along  each  track,  tick  marks  represent  back  and  forth  movement.  At 
the  junctures  between  tracks,  e  more  complicated  movement  occurs.  There  are  no  artifacts  of  the  toroidal  grid 
in  this  plot,  all  are  actual  movements  (tf.  Figure  I4d). 

removal  would  affect  only  ease  of  implementation.  In  fact,  no  assumed  features  of  GNARL’s  net¬ 
works  are  essential  for  the  algorithm’s  operation.  GNARL  could  even  use  nondifferentiable  acti¬ 
vation  functions,  a  constraint  for  backpropagation. 

GNARL’s  minimal  representational  constraints  would  be  meaningless  if  not  complemented  by 
appropriate  search  dynamics  to  traverse  the  space  of  networks.  First,  unlike  constructive  and 
destructive  algorithms,  GNARL  permits  a  nonmonotonic  search  over  the  space  of  network  topol- 
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ogles.  Consider  that  in  monotonic  search  algorithms,  the  questions  of  when  and  how  to  modify 
structure  take  on  great  significance  because  a  premature  topological  change  cannot  be  undone.  In 
contrast,  GNARL  can  revisit  a  particular  architecture  at  any  point,  but  for  the  architecture  to  be 
propagated  it  must  confer  an  advantage  over  other  competing  u^logies.  Such  a  non-linear  tra¬ 
versal  of  the  space  is  imperative  for  acquiring  ^{m>priate  solutions  because  the  efficacy  of  the 
various  architectures  changes  as  the  parametric  values  are  modified. 

GNARL  allows  multiple  structural  manipulations  to  a  netwoik  within  a  single  mutation.  As 
discussed  earlier,  constructive  and  destructive  algorithms  define  a  unit  of  modification,  e.g.,  “add 
a  fully  connected  hidden  node.”  Because  such  singular  structural  modifications  create  a  “one-unit 
structural  horizon”  beyond  which  no  information  is  available,  such  algorithms  may  easily  fixate 
on  an  architecture  that  is  bener  than  networks  one  modification  step  away,  but  worse  than  those 
two  or  more  steps  distant.  In  GNARL,  several  nodes  and  links  can  be  added  or  deleted  with  each 
mutation,  the  range  being  determined  by  user-specified  limits  and  the  current  ability  of  the  net¬ 
work.  This  simultaneous  modification  of  the  structural  and  parametric  modifications  based  on  fit¬ 
ness  allows  the  algorithm  to  discover  appropriate  netwtnks  quickly  especially  in  comparison  to 
evolutionary  techniques  that  do  not  respect  the  uniqueness  of  distributed  representations. 

Finally,  as  in  all  evolutionary  computation,  GNARL  maintains  a  population  of  structures  dur¬ 
ing  the  search.  This  allows  the  algorithm  to  investigate  several  differing  architectures  in  parallel 
while  avoiding  over-commitment  to  a  particular  network  topology. 

These  search  dynamics,  combined  with  GNARL’s  minimal  representational  constraints  make 
the  algorithm  extremely  versatile.  Of  course,  if  topological  constraints  are  known  a  priori,  they 
should  be  incorporated  into  the  search.  But  these  should  be  introduced  as  part  of  the  task  specifi¬ 
cation  rather  than  being  built  into  the  search  algorithm.  Because  the  only  requirement  on  a  fitness 
function  /  is  that  f:  S  M,  diverse  criteria  can  be  used  to  rate  a  network’s  performance.  For 
instance,  the  first  two  experiments  described  above  evaluated  networks  based  on  a  desired  input/ 
output  mapping;  the  Tracker  task  experiment,  however,  considered  overall  network  performance, 
not  specific  mappings.  Other  cnteria  could  also  be  introduced,  including  specific  structural  con¬ 
straints  (e.g.,  minimal  number  of  hidden  units  or  links)  as  well  as  constraints  on  generalization.  In 
some  cases,  strong  task  restrictions  can  even  be  implicit  in  simple  fimess  functions  [48]. 

The  dynamics  of  the  algorithms  guided  by  the  task  constraints  represented  in  the  fitness  func¬ 
tion  allow  GNARL  to  empirically  determine  an  appropriate  architecture.  Over  time,  the  continual 
cycle  of  test-prune-reproduce  will  constrain  the  population  to  only  those  architectures  that  have 
acquired  the  task  most  rapidly.  Inappropriate  networks  will  not  be  indefinitely  competitive  and 
will  be  removed  from  the  population  eventually. 

Ck)mplete  network  induction  must  be  approached  with  respect  to  the  complex  interaction 
between  network  topology,  parametric  values,  and  task  perftxmance.  By  fixing  topology,  gradient 
descent  methods  can  be  used  to  discover  appropriate  solutions.  But  the  relationship  between  net¬ 
work  structure  and  task  performance  is  not  well  undCTStood,  and  there  is  no  “backpropagation” 
through  the  space  of  network  architectures.  Instead,  the  network  induction  problem  is  approached 
with  heuristics  that,  as  described  above,  often  restrict  the  available  architectures,  the  dynamics  of 
the  search  mechanism,  or  both.  Artificial  architectural  constraints  (such  as  “feedforwardness”)  or 
overly  constrained  search  mechanisms  can  impede  the  induction  of  entire  classes  of  behaviors. 
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while  forced  structural  liberties  (such  as  assumed  full  recurrence)  may  unnecessarily  increase 
structural  complexity  (x  learning  time.  By  relying  on  a  sinq>le  stochastic  process,  GNARL  strikes 
a  middle  ground  between  these  two  extremes,  allowing  die  network’s  complexity  and  behavior  to 
emerge  in  response  to  the  requirements  of  the  task. 

6.0  Acknowledgments 

This  research  has  been  partially  supponed  by  ONR  grants  NG0014-92-J-119S  and  N00014- 
93-  1-00S9.  We  are  indebted  to  Ed  Large,  Dave  Stucld  and  especially  John  Kolen  for  proofreading 
help  and  discussions  during  the  development  of  this  researciL  Finally,  we  would  like  to  thank  our 
anonymous  reviewers,  and  the  attendees  of  Connectfest  ’92  for  feedback  on  a  preliminary  ver¬ 
sions  of  this  work. 


7.0  References 

[1]  A.  G.  Barto.  Connectionist  learning  for  control.  In  W.  T.  Miller  HI,  R.  S.  Sunon,  and  R  J. 
Werbos,  editors.  Neural  Networks  for  Control,  chapter  1,  pages  5-58.  MIT  Press,  Cam¬ 
bridge,  1990. 

[2]  T.  Ash.  Dynamic  node  creation  in  backpropagation  networks.  Connection  Science, 
l(4):365-375. 1989. 

[3]  M.  Frean.  The  upstart  algorithm:  A  method  fw  constructing  and  training  feed-forward  neu¬ 
ral  networks.  Technical  Report  Preprint  89/469,  Edinburgh  Physics  Dept,  1990. 

[4]  S.  J.  Hanson.  Meiosis  networks.  In  D.  Touretzky,  editor,  Advances  in  Neural  Irformation 
Processing  Systems  2,  pages  533-541.  Morgan  Kaufmann,  San  Mateo,  CA,  1990. 

[5]  S.  E.  Fahlman  and  C.  Lebiere.  The  cascade-correlation  architecture.  In  D.  S.  Touretsky,  edi¬ 
tor,  Advances  in  Neural  Information  Processing  Structures  2,  pages  524-532.  Morgan  Kauf¬ 
mann,  San  Mateo,  CA,  1990. 

[6]  S.  Fahlman.  The  recurrent  cascade-correlation  architecture.  In  R.  Lippmann,  J.  Moody,  and 
D.  Touretzky,  editors.  Advances  in  Neural  Irformation  Processing  Systems  3,  pages  190- 
196.  Morgan  Kaufmann,  San  Mateo,  CA,  1991. 

[7]  D.  Chen,  C.  Giles,  G.  Sun,  H.  Chen,  Y.  Less,  and  M.  Goudreau.  Constructive  learning  of 
recurrent  neural  networks.  IEEE  International  Corrference  on  Neural  Networks,  3:1196- 
1201, 1993. 

[8]  M.  R.  Azimi-Sadjadi,  S.  Sheedvash,  and  F.  O.  Trujillo.  Recursive  dynamic  node  creation  in 
multilayer  neural  networks.  IEEE  Transactions  on  Neural  Networks,  4(2):242-256, 1993. 

[9]  M.  Mozer  and  P.  Smolensky.  Skeletonization:  A  techiuque  for  trimming  the  fat  from  a  net¬ 
work  via  relevance  assessment.  In  D.  Touretzky,  editor.  Advances  in  Neural  Information 
Processing  Systems  I,  pages  107-115.  Morgan  Kaufnumn,  San  Mateo,  CA,  1989. 

[10]  Y.  L.  Cun,  J.  Denker,  and  S.  SoUa.  Optimal  brain  damage.  In  D.  Touretzky,  editor.  Advances 
in  Neural  Irformation  Processing  Systems  2.  Morgan  Kaufmann,  San  Mateo,  CA,  1990. 


The  Ohio  State  University 


July  16. 1993 


24 


[11]  B.  Hassibi  and  D.  G.  Stork.  Second  order  derivatives  for  network  pruning:  Optimal  brain 
surgeon.  In  S.  J.  Hanson,  ].  D.  Cowan,  and  C.  L.  Giles,  ediuvs.  Advances  in  Neural  Irrfor- 
motion  Processing  Systems  5,  pages  164-171.  Mot:gan  Kaufinann,  San  Mateo,  CA,  1993. 

[12]  C.  W.  Omlin  and  C.  L.  Giles.  Pruning  recurrent  neural  networks  for  improved  generalization 
perftmxumce.  Technical  Report  Tech  Report  No  93-6,  Computer  Science  Department,  Rens¬ 
selaer  Polytechnic  Institute,  April  1993. 

[13]  L.  J.  Fbgel,  A.  J.  Owens,  and  M.  J.  Walsh.  Artificial  Intelligence  through  Simulated  Evolu¬ 
tion.  John  Wiley  &  Sons,  New  York,  1966. 

[14]  D.  B.  Fogel.  Evolving  Artificial  Intelligence.  Ph.D.  thesis.  University  of  California,  San 
Diego,  1992. 

[15]  J.  R  Holland.  Adaptation  in  Natural  and  Artificial  Systems.  The  University  of  Michigan 
Press,  Ann  Arbor,  MI,  1975. 

[16]  D.  E.  Goldberg.  Genetic  Algorithms  in  Search,  Optimization,  and  Machine  Learrting.  Addi¬ 
son- Wesley  Publishing  Company,  Inc.,  Reading,  MA,  1989. 

[17]  D.  B.  Fogel.  An  introduction  to  simulated  evolutionary  optimization.  This  issue. 

[18]  A.  P.  Wieland.  Evolving  neural  network  controllers  for  unstable  systems.  In  IEEE  Interna¬ 
tional  Joint  Conference  on  Neural  Networks,  pages  11-667  -  11-673,  IEEE  Press,  Seattle, 
WA,  1990. 

[19]  D.  Montana  and  L.  Davis.  Training  feedforward  neural  networks  using  genetic  algorithms. 
In  Proceedings  of  the  Eleventh  International  Joint  Cotference  on  Artificial  Intelligence, 
pages  762-767,  Morgan  Kaufimann,  San  Mateo,  CA,  1989. 

[20]  D.  Whitley,  T.  Starkweather,  and  C.  Bogart  Genetic  algorithms  and  neural  networks:  Opti¬ 
mizing  connections  and  connectivity.  Parallel  Computing,  14:347-361, 1990. 

[21]  R.  D.  Beer  and  J.  C.  Gallagher.  Evolving  dynamical  neural  networks  for  adaptive  behavior. 
Adaptive  Behavior,  1(1):91-122, 1992. 

[22]  G.  F.  Miller,  P.  M.  Todd,  and  S.  U.  Hegde.  Designing  neural  networks  using  5enetic  algo¬ 
rithms.  In  J.  D.  Schaffer,  editor.  Proceedings  of  the  Third  International  Conference  on 
Genetic  Algorithms,  pages  379-384.  Morgan  Kautinaim,  San  Mateo,  CA,  1989. 

[23]  R.  K.  Belew,  J.  Mclnemey,  and  N.  N.  Schraudolf.  Evolving  networks:  Using  the  genetic 
algorithm  with  connectionist  learning.  Technical  Report  CS90-174,  University  of  Califor¬ 
nia,  San  Diego,  June  1990. 

[24]  J.  Tonccle.  Temporal  processing  with  recurrent  netwwks;  An  evolutionary  approach.  In  R. 
K.  Belew  and  L.  B.  Booker,  editors.  Fourth  International  Conference  on  Genetic  Algo¬ 
rithms,  pages  555-561.  Morgan  Kaufmaim,  San  Mateo,  California,  1991. 

[25]  M.  A.  Potter.  A  genetic  cascade-correlation  learning  algorithm.  In  Proceedings  of 
COGANN-92  International  Workshop  on  Combinations  of  Genetic  Algorithms  and  Neural 
Networks,  1992. 

[26]  N.  Karunanithi,  R.  Das,  and  D.  Whitley.  Genetic  cascade  learning  for  neural  networks.  In 
Proceedings  of  COGANN-92  International  Workshop  on  Combinations  of  Genetic  Algo¬ 
rithms  and  Neural  Networks,  1992. 


The  Ohio  State  University 


July  16, 1993 


25 


[27]  D.  E.  Goldberg.  Genetic  algorithms  and  Walsh  functions:  Part  2,  Deception  and  its  analysis. 
Complex  Systems,  3:153-171, 1989. 

[28]  D.  E.  Goldberg.  Genetic  algorithms  and  Walsh  functions:  Part  1.  A  gentle  introduction. 
Complex  Systems,  3:12^152, 1989. 

[29]  J.  D.  Schaffer,  D.  Whitley,  and  L.  J.  Eshelman.  Conobinations  of  genetic  algorithms  and  neu¬ 
ral  networks:  A  survey  of  the  state  of  the  art.  In  Proceedings  of  COGANN-92  International 
Workshop  on  Combinations  of  Genetic  Algorithms  and  Neural  Networks,  1992. 

[30]  G.  E.  Hinton,  J.  L.  McQelland.  and  D.  E.  Rumelhart  Distributed  representations.  In  D.  E. 
Rumelhart  and  J.  L.  McClelland,  editors.  Parallel  Distributed  Processing:  Explorations  in 
the  Microstructure  of  Cognition,  volume  1:  Foundations,  pages  n~\Q9.  MIT  Press,  Cam¬ 
bridge,  MA,  1986. 

[31]  T.  J.  Sejnowski  and  C.  R.  Rosenberg.  Parallel  networks  that  learn  to  pronounce  english  text. 
Complex  Systems,  1:145-168,  1987. 

[32]  J.  Koza  and  J.  Rice.  Genetic  generation  of  both  the  weights  and  architecture  for  a  neural  net- 
woric.  In  IEEE  International  Joint  Corference  on  Neural  Networks,  pages  11-397  -  11-404, 
Seattle,  WA,  IEEE  Press,  1991. 

[33]  R.  Collins  and  D.  Jefferson.  An  artificial  neural  network  representation  for  artificial  organ¬ 
isms.  In  H.  P.  Schwefel  and  R.  Maimer,  editors.  Parallel  Problem  Solving  from  Nature. 
Springer-Verlag,  1991. 

[34]  D.  B.  Fogel.  A  brief  history  of  simulated  evolution.  In  D.  B.  Fogel  and  W.  Atmar,  editors, 
Proceedings  of  the  First  Annual  Coiference  on  Evolutionary  Programming,  Evolutionary 
Programming  Society,  La  Jolla,  CA.,  1992. 

[35]  D.  B.  Fogel,  L.  J.  Fogel,  and  V.  W.  Porto.  Evolving  neural  networks.  Biological  Cybernetics, 
63:487-493, 1990. 

[36]  J.  R.  McDonnell  and  D.  Waagen.  Determiiung  neural  network  connectivity  using  evolution¬ 
ary  programming.  In  Twenty-fifth  Asilomar  Corferences  on  Signals,  Systems,  and  Comput¬ 
ers,  Monterey,  CA,  1992. 

[37]  D.  B.  Fogel.  Using  evolutionary  programming  to  create  neural  networks  that  are  capable  of 
playing  Tic-Tac-Toe.  In  International  Cotference  on  Neural  Networks,  pages  875-880. 
IEEE  Press,  San  Francisco,  CA,  1993. 

[38]  S.  Kirkpatrick,  C  D.  Gelatt,  and  M.  P.  Vecchi.  Optimization  by  simulated  annealing.  Sci¬ 
ence,  220:671-680, 1983. 

[39]  R.  J.  Williams.  Adaptive  State  Representation  and  Estimation  Using  Recurrent  Connection- 
ist  Networks,  chapter  4,  pages  97-114.  MIT  Press,  Cambridge,  MA,  1990. 

[40]  J.  B.  Pollack.  The  induction  of  dynamical  recognizers.  MacMne  Learning,  7:227-252,  199 1 . 

[41]  M.  Totnita.  Dynamic  construction  of  finite  autonoata  from  examples  using  hill-climbing.  In 
Proceedings  of  the  Fourth  Annual  Coiference  of  the  Cognitive  Science  Society,  pages  105- 
108,  Ann  Arbor,  MI,  1982. 


The  Ohio  State  University 


July  16, 1993 


26 


[42]  R.  L.  Watrous  and  G.  M.  Kuhn.  Induction  of  finite-state  automata  using  second-order  recur¬ 
rent  networks.  In  Advances  in  Neural  [/formation  Processing  4.  Morgan  Kaufmann.  San 
Mateo.  CA.  1992. 

[43]  C.  L.  Giles,  G.  Z.  Sun,  R  H.  Chen,  Y.  C.  Lee,  and  D.  Chen.  Higher  order  recurrent  networks 
&.  grammatical  inference.  In  D.  S.  Touretsky,  editor.  Advances  in  Neural  It^ormation  Pro¬ 
cessing  Systems  2,  pages  380-387.  Mtvgan  Kaufinann,  San  Mateo,  CA,  1990. 

[44]  C.  L.  Giles,  C  B.  Miller,  D.  Chen,  G.  Z.  Sun.  R  R  Chen,  and  Y.  C.  Lee.  Extracting  and 
learning  an  unknown  grammar  with  itcunent  neural  networks.  In  Advances  in  Neural  Infor- 
motion  Processing  4.  Morgan  Kaufmann,  San  Mateo,  CA,  1992. 

[45]  Z.  Zeng,  R.  M.  Goodman,  and  P.  Smyth.  Learning  finite  state  machines  with  self-clustering 
recurrent  networks.  Neural  Computation,  to  appear. 

[46]  D.  Jefferson,  R.  Collins,  C.  Cooper.  M.  Dyer,  hi  Flowers,  R.  Korf,  C.  Taylor,  and  A.  Wang. 
Evolution  as  a  theme  in  artificial  life:  The  genesys/tracker  system.  In  C.  G.  Langton,  C.  Tay¬ 
lor,  J.  D.  Farmer,  and  S.  Rasmussen,  editors.  Artificial  Life  11:  Proceedings  of  the  Workshop 
on  Artificial  Ufe,  pages  549-577.  Addison- Wesley,  1991. 

[47]  J.  Koza.  Genetic  evolution  arul  co-evolution  of  computer  programs.  In  J.  D.  F.  Christopher 
G.  Langton,  Charles  Taylor  and  S.  Rasmussen,  editors,  ^tificial  Ufe  II.  Addison  Wesley 
Publishing  Company,  Reading  Mass.,  1992. 

[48]  P.  J.  Angelina  and  J.  B.  Pollack.  Competidve  environments  evolve  better  solutions  for  com¬ 
plex  tasks.  In  S.  Forrest,  editor.  Genetic  Algorithms:  Proceedings  of  the  Fifth  International 
Conference  (GA93),  Morgan  Kaufmann,  San  Mateo,  CA,  1993. 


The  Ohio  State  University 


July  16, 1993 


27 


